LOW-RANK MATRIX RECOVERY VIA ITERATIVELY REWEIGHTED 
LEAST SQUARES MINIMIZATION 
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Abstract. We present and analyze an efficient implementation of an iteratively reweighted least squares 
algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed 
for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. 
Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property 
known in the context of compressed sensing, the algorithm is guaranteed to recover iteratively any matrix with 
an error of the order of the best fc-rank approximation. In certain relevant cases, for instance for the matrix 
completion problem, our version of this algorithm can take advantage of the Woodbury matrix identity, which 
allows to expedite the solution of the least squares problems required at each iteration. We present numerical 
experiments which confirm the robustness of the algorithm for the solution of matrix completion problems, and 
demonstrate its competitiveness with respect to other techniques proposed recently in the literature. 

AMS subject classification: 65J22, 65K10, 52A41, 49M30. 
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1. Introduction. Affine rank minimization refers to the problem of finding a matrix of 
minimal rank consistent with a given underdetermined linear system of equations. This problem 
arises in many areas of science and technology, including system identification , collaborative 
filtering, quantum state tomography [THl [T7], signal processing, and image processing. An 
important special case is the matrix completion problem [6l [71 [33] , where one would like to fill 
in missing entries of a large data matrix which is assumed to have low-rank. 

Unfortunately, the affine rank minimization problem is NP-hard in general [311 [IH] ; there- 
fore, it is desirable to have tractable alternatives. In [If j . Fazel studied nuclear norm minimiza- 
tion for this purpose, which was known to be a good proxy for rank minimization. The nuclear 
norm of a matrix is the ^i-norm of its singular values, and is the smallest convex envelope of 
the rank function Reformulated as a semidefinite program, nuclear norm minimization 

can be solved with efficient methods [5]. 

1.1. Contribution of this paper. Unfortunately, standard semidefinite programming 
tools work efficiently for solving nuclear norm minimization problems only for matrices up 
to size approximately 100 x 100. Therefore, it is crucial to develop fast algorithms that are 
specialized to nuclear norm minimization (or other heuristics for rank minimization). So far, 
several approaches have been suggested ^51 [H [13 1201 [U] • Some aim at general nuclear norm 
minimization problems and others are specialized to matrix completion problems. Inspired by 
the iteratively reweighted least-squares algorithm analyzed in [lOj for sparse vector recovery, 
we develop a variant of this algorithm for nuclear norm minimization / low-rank matrix re- 
covery. Each step of the proposed algorithm requires the computation of a partial singular 
value decomposition and the solution of a (usually small) least squares problem, and both of 
these tasks can be performed efficiently. The analysis of our algorithm is based on a suitable 
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matrix extension of the null space property, well-known in the approximation theory literature 
in connection with i'l-minimization, see |311 I30' and reference therein, and recently popularized 
in the context of compressive sensing |12j . We show that the algorithm essentially has the same 
recovery guarantees as nuclear norm minimization. Numerical experiments also show that our 
algorithm is competitive with other state-of-the-art algorithms in the matrix completion setup 
[201 1211 [T5] . Unfortunately, the null space property fails in the matrix completion setup, and 
the theoretical analysis of the algorithm seems to be much more involved. Such a theoretical 
analysis will be postponed to later investigations. 

1.2. Low-rank matrix recovery and applications. In the following, we will deal with 
real or complex matrices indifferently and we denote the space oi n x p matrices by Mnxp- 
Given a linear map =5^ : Mnxp C™, with m <C pn, and a vector ^ G C™, we consider the 
affinc rank minimization problem 

min rank(X) subject to y{X) = (1.1) 

An important special case of low-rank matrix recovery is matrix completion, where .y samples 
entries, 

yix), = x,,, (1.2) 

for some i,j depending on £. Such low- rank matrix recovery problems often arise; examples 
include the Netflix problen^or the recovery of positions from partial distance information [5]. 
We refer to |6l [7j for further details. As an example of low-rank matrix recovery from more 
general linear measurements, we can turn to quantum state tomography |18j . where one tries 
to recover a quantum state, that is, a square matrix X e M^xm that is positive semidefinite 
and has trace 1. A pure state has rank 1, and a mixed state is of low-rank, or approximately 
low-rank. Then one has given a collection of matrices Ak S Mnxm k = 1, . . . ,n^, (so called 
Pauli-Matrices) and takes partial "quantum observations" = Tr(A^.X), j = 1, . . . ,r with 
r < n'^, and the task is to recover a low-rank state X. We refer to [T8l[T7] for details. 



1.3. Theoretical results. As already mentioned the rank minimization problem ( |1.1[ ) is 
NP-hard in general, and therefore we consider its convex relaxation 

^min \\X\\^ subject to ^(X) = (1.3) 

where ||A"||* = '^ii-^) denotes the nuclear norm (or Schatten-1 norm, or trace norm), 

where ai{X) are the singular values of X. There are two known regimes where nuclear norm 
minimization can be guaranteed to return minimal-rank solutions: 

1. RIP measurement maps. The (rank) restricted isometry property (RIP) is analogous 
to the by-now classical restricted isometry property (RIP) from compressed sensing [51 134|. 

Definition 1.1 (Restricted Isometry Property [31]). Let y : Mnxp — > C™ be a linear 
map. For every integer k, with 1 < k < n, define the k-restricted isometry constant 5k = 
Sk{y) > to be the smallest number such that 

ii-Sk)\\x\\j,<\\y{x)\\j^<ii + 5k)\\xrF 

holds for all k-rank matrices X . 

It is shown in [34 that nuclear norm miminization ( 1.3 1 recovers all matrices X of rank at 
most k from the measurements ^ = y{X) provided S^^k is small enough. Below we improve 
this to (54fc < — 1, and we refer to [3 HHj for related results. 



^ Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going 
to enjoy a movie based on their movie preferences http://www.netflixprize.com/ 



The restricted isometry property is known to hold for Gaussian (or more generally sub- 
gaussian) measurement maps ,5^ in [341 [5] . These are maps of the form 



^(X), = ^af,fc,,Xfe,j, ^=l,...,m, (1.4) 

where the a^^^j- are independent normal distributed random variables with mean zero and 
variance 1/m. Such a map satisfies ^fe < (5 G (0, 1) with high probability provided 

'Ti > C5 max{p, n}/c, (1-5) 

see Theorem 2.3 in [S]. Since the degrees of freedom of a rank fc matrix X £ Mnxp are 
k{n + p ~ k), the above bound matches this number up to possibly a constant. Therefore, the 



bound (1.5) is optimal. 

Using recent results in (TJ [52] , it follows that the restricted isometry property also holds for 
certain structured random matrices if slightly more measurements are allowed; in particular, 



for maps of the form (1.4 1 where the ai,k.j ~ unraveled for each £ into vectors of length pn 
- correspond to m rows randomly selected from the pn x pn discrete Fourier matrix (or 2D 
Fourier transform matrix) with randomized column signs, Sk < (5 G (0, 1) with high probability 
as long as 

m > C5 max{p, n}fc log^(pn). (1-6) 

This follows from the recent findings in [2 [^2 that such random partial Fourier measurements 
satisfy a concentration inequality of the form, for all < e < 1, 

¥[\\\S^{X)r-\\X\\%\>s\\X\\%)<2exp{^^C,log-\pn)). (1.7) 



Subgaussian measurement maps also satisfy (1.7), and for these maps, the factor of log ^(pn) 



in the exponent can be removed. The proof of RIP for subgaussian random ensembles appear- 
ing in Theorem 2.3 in [5] (see also Theorem 4.2 in [35) relies only on such concentration for 
subgaussian measurement maps. Therefore, obvious modifications to that proof to accommo- 



date the additional logarithmic factors in the exponent (1.7) give the stated RIP results for 
random partial Fourier measurements. 



2. Matrix completion. In the matrix completion setup ( 1.2 ) where measurements are point- 
wise observations of entries of the matrix, there are obvious rank one matrices in the kernel 
of the operator ^5^; therefore, the RIP fails completely in this setting, and 'localized' low-rank 
matrices in the null space of cannot be recovered by any method whatsoever. However, if 
certain conditions on the left and right singular vectors of the underlying low-rank matrix are 
imposed, essentially requiring that such vectors are uncorrelated with the canonical basis, then 
it was shown in [BJ [71 133j that such incoherent matrices of rank at most k can be recovered 
from m randomly chosen entries with high probability provided 

m > Cfcmax{7i,p}log^(max{?i,p}). 

Up to perhaps the exponent 2 at the log-term, this is optimal. We refer to [6l[7l[33] for detailed 
statements. In [151 IIZI extensions to quantum state tomography can be found. 

2. Notation and Preliminaries. 

2.1. Notation. The entries of a matrix X e Mnxp are denoted by lower case letters and 
the corresponding indices, i.e., Xij = Xij. We denote the adjoint matrix X* e Mp^n as usual. 
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In the following and without loss of generality, we assume that n < p. For a generic matrix 
X G Mnxp we write its singular value decomposition 

X = C/EF* 

for unitary matrices U G Mnxn,V G Mpxp and a diagonal matrix E = diag (cri, cr„) G 
Mnxp-, where cri > cr2 ^ ■ • • > are the singular values. In the following, we denote cr(X) 
the vector of the singular values of the matrix X. For a specific matrix X we indicate the 
singular value decomposition as X = Ux^xVx- ^or self-adjoint matrices X = X* , we have 
V — U. We write X ;^ if X is self-adjoint and positive-definite. In this case we can define, 
for a e M, the a-th power of the matrix X hy X" = UT,°'U* , where E" = diag(cr^', . . . , ct"). 
If X is positive semi-definite, we write X ^ 0. 

The rank of X S Mnxp denoted by rank(X) equals the number of nonzero singular values 
of X. We will say that X is a k-rank matrix if rank(X) < k. The trace of a square matrix 
X e Mnxn is the sum of its diagonal entries, i.e., Tr(X) = Y^^=i^ii- The trace satisfies 
Tr(X) = Tr(X*) and is cyclic, that is, Tt{XY) = Tt{YX) for all matrices X, Y with matching 
dimensions. If A; are the eigenvalues of X then Tr{X) — ^i- Denote by Eij the matrix 
whose (i, j)-entry is 1 and all other entries take the value 0. Then TT{EijX) = Xji. 

The space of complex matrices Mnxp forms a Hilbert space when endowed with the natural 
scalar product {X,Y) = Tr(Xy*), which induces the Frobenius norm \\X\\f = (X, X)^/^. The 

(\ "'"/■^ 1/2 
X^iLi X]j=i NijP ) — (^1=1^^) ■ More generally, 

we consider the Schatten g-norms ||X||s, \\a{X)\\iri _2j [19], which are based on the 

vector norms ||a;||^ii (X]r=i l^d'^)^^''' 1 5; 9 < oo, and |la;|j£ri^ :— max"^j^ \xi\. Of particular 
importance is the Schatten 1-norm, or nuclear norm, which we also denote by \\X\\.^, — \\X\\s-^ = 
S"=i The operator norm ||X||op = H-'^Hs^ will be needed as well. 

If XX* is invertiblc, then the absolute value of X is defined by 

\X\ = {XX*)-^^^XX*= (2.1) 

in this case, the Schatten g-norm satisfies 

:=(Tr|X|'')i/' (2.2) 

for all 1 < g < oo. 

We consider linear operators on matrices of the type .y : Mnxp C™. We denote 
the action of ^ on the matrix X by .y{X) in order to distinguish it from yX which may 
create confusion and be interpreted as a matrix-matrix multiplication in certain passages below. 
Rather, as the vector space Mnxp is isomorphic to the vector space C"^^, linear operators of 
the form ^ : Mnxp — ^ C™ may be interpreted as elements of Mmxnp- We denote by y* the 
adjoint operator of y, such that A)c". = {X, y*{X)) for all X e Mnxp, A € C™, where 

the former scalar product is the Euclidean one on C™ and the latter scalar product is the one 
inducing the Frobenius norm. We denote by / := /„ S Mnxn the identity matrix. 

Finally, let us define the k-spectral truncation of X by 

X[k] = UT,[f,]V*, 

where E[fc] = diag((Ti, . . . , Ufc, 0, . . . , 0). Thus, this operation acts by setting to all the singular 
values with indexes from fc -|- 1 to n. We also introduce the £-stabilization of X by 

X^^U^eV*, (2.3) 

where E^ — diag(max{(Tj, e}). That is, under this operation, all singular values below a certain 
minimal threshold are increased to the threshold. 
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For a self-adjoint matrix X = UTiJJ* G M„xn, note that 

X^ = UY.U* + el, (2.4) 

where E — diag (maxjcrj — £,0}). In particular, if a significant number of the singular values 
of X fall below the threshold e, then Xf, is decomposed into the sum of a low-rank matrix and 
a scaled identity matrix. 



2.2. Motivation and formulation of the algorithm. Let us fix a sampling operator 
: M„xp C" an 
minimization problem 



■S^ : Mnxp ^ C™ and a measured data vector e C™. We are interested in the rank 



arg min rank(X). (2.5) 

As mentioned in the introduction, this problem can be efficiently solved under suitable assump- 
tions on by considering its convex relaxation 

arg min IIXIL, (2.6) 
where rank minimization is substituted by nuclear norm minimization. 



We propose an algorithm for solving ( 2.6 ) reminiscent of iteratively reweighted least squares 



algorithm for linearly-constrained ^i-norm minimization, and is based on the following moti- 



vation: if all of the singular values of X G Mnxp are nonzero, then, according to (2.1|-(2.2| 
we may re-write its nuclear norm as 



IXIL = llXlIci =Tr 



{XX*)-^'^{XX*)\ = \\W^/^X\\l, (2.7) 



where W = {XX*) This suggests the following approach for solving the nuclear norm 



minimization problem (2.6): let us initialize a weight matrix W'^ G Mnxn, and then recursively 
define, for = 0, 1, . . . , 

X^+i=arg min \\iW'y/^X\\l, and W'+' = (x'+' {X'+')*y'^\ (2.8) 

Observe that the minimization X = argmin^(x)=^ can be reformulated as 

a weighted least squares problem with m linear constraints on the np variables Xij] each of 
the updates for X^~^^ and W^^^ can then be performed explicitly. However, once any of the 
singular values of X^^^ become small - a desirable situation as we seek to recover low-rank 
matrices - the computation of W^^^ becomes ill-conditioned. 

To stabilize the algorithm, we fix a parameter e > 0, and increase the small singular values 
of X^ according to (Ti(X^) — > max{cri(X^), e} after updating X^ but before computing . 



That is, we replace X by its e-stabilization X\ = UT,iV* , as defined in (2.3). The idea is 
that X^ is well-conditioned and, additionally, if X^ is well-approximated by a fc-rank matrix, 
then X^ is well-approximated by the same fc-rank matrix. Keeping e > fixed throughout the 
iterations, we would no longer expect the algorithm to converge to the nuclear norm solution 



(2.6). Nevertheless, if we allow for e = — > in such a way that ei < ak{X ) is maintained 
at each iteration, then one may hope for both stability and convergence towards a fc-rank 
solution. Collecting these ideas together, we arrive at the following iterative reweighted least 
squares algorithm for low-rank matrix recovery (IRLS-M): 
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IRLS-M algorithm for low-rank matrix recovery: Initialize W'^ = I E Af„xn- Set 
£o := 1, K > k, and 7 > 0. Then recursively define, for £ — 1,2, ... , 

X'-.^arg min \\iw'-'y^''xfp, (2.9) 

min {ef_i, 7(7/^+1 (X^}. (2.10) 

Compute G Af„xn and — diag (cr^, ct^) e Af„xn for which 

X^(XO* = C/^(E^)^([/0*- (2.11) 

Set = U\j:ly\uy , (2.12) 



where denotes the e-stabilization of the matrix E, see ( |2.3[ ). The algorithm stops if 
£f = 0; in this case, define X^ := X^ for j > £. In general, the algorithm generates an 
infinite sequence (X^)^gN of matrices. 



Reduction to iteratively reweighted least squares for vector recovery. Suppose now that 
n = p and the linear constraints ^{X) — ^ are equivalent to the m x n linear system 
Sx = ^ exclusively acting on the vector x = diagX, and xij = is enforced for i ^ j. Then 
the nuclear norm minimization problem ( |2.6| reduces to that of finding the vector x G C" of 
minimal i!"-norm which solves the optimization problem 

arg^mii^||z||^j. (2.13) 

In this context, the updates for X^'^^ and W^^^ in the IRLS-M algorithm reduce to vector 
updates x^+'^ e C" and w^+'^ G 

Although iteratively reweighted least squares algorithms for solving constrained ^" min- 



imization problems of the form (2.13) have been around for half a century, it was not until 
recently (10] that conditions on the matrix S G Af,„xri were provided under which an it- 
eratively reweighted least squares algorithm could be proven to converge to the solution of 



the minimization problem (2.13). In the algorithm proposed in [10] . the weight vector 
is updated slightly differently than in the current setting; in |10) it is updated according to 
wl = (Ixf p -f ef)~'^/'^, while our algorithm uses the update rule 

' ' £^ ^, else. 

Despite this difference, the assumptions, which we shall require on the measurements in 



order to prove convergence of the IRLS-M algorithm to the nuclear-norm solution (2.6), draw 



upon the the variational framework introduced in |10j . However, our update rule for the 



weights (2.2) is reminiscent of a previously studied IRLS algorithm [S], which was introduced 
in the slightly different context of total variation minimization as opposed to £1 minimization. 
However, in that algorithm, the parameter = e > is fixed throughout the iteration; the 
corresponding iterations are shown to converge to a solution x = x{e), and one can show 
that x{£) converges to the £i-norm solution a:;* as e — >■ provided that x* is unique. For an 
overview of iteratively reweighted least squares minimization, we refer the reader to |10j. 

2.3. Convergence results. As a consequence of the main result of this paper. Theorem 



6.11 we have the following convergence result for the IRLS-algorithm under the assumption 
that the measurement map has the restricted isometry property in Definition |1.1[ 

Proposition 2.1. Consider the IRLS-M algorithm with parameters 7 ~ 1/n and K eN. 
Let 5^ : M^xp C™ be a surjective map with restricted isometry constants S^k, ^ak satisfying 
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V — i^s^K ^ ^ K-'2 ' */ ihere exists a k-rank matrix X satisfying ■y{X) — ^ with 

k < K — the sequence (X^)^^^ converges to X. 

Actually, Theorem |6 . 1 1 1 proves something stronger: the IRLS-M algorithm is robust, in the 
sense that under the same conditions on the measurement map , the accumulation points 
of the IRLS-M algorithm are guaranteed to approximate an arbitrary X G M„xp from the 
measurements ^ — .y{X) to within a factor of the best k-rank approximation error of X in 
the nuclear norm. 

The IRLS-M algorithm also has nice features independent of the affine measurements. In 
particular, the parameter si can serve as an a posteriori check on whether or not we have 
converged to a low-rank (or approximately low-rank) matrix consistent with the observations: 

Remark 2.2. //e^-i > e and ee = jcrK+i{X^) = e, then \\X^ - X^^j||oj, < e/7. In 
particular, if ee — 0, then X^ is a K-rank matrix. 



3. Numerical Experiments. In the matrix completion setup ( 1.2 1 where the affine mea- 
surements are entries of the underlying matrix, the restricted isometry property fails, and so 
Proposition |2.1| does not apply. Nevertheless, we illustrate in this section numerical experi- 
ments which show that the IRLS-M algorithm still works very well in this setting for recovering 
low-rank matrices, and is competitive with state-of-the-art algorithms in the matrix completion 
setup; namely, the Fixed Point Continuation algorithm (FPCA), as introduced in |15j, and the 
OptSpace algorithm, as introduced in |20j . For further numerical experiments comparing an 
algorithm closely related to IRLS-M with other matrix completion algorithms such as Singular 
Value Thresholding [3], we refer the reader to [55]. 
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Fig. 3.1. Comparison of IRLS-M, FPCA, and OptSpace over a class of randomly-generated rank-lO 
matrices. 



1. In Figure |3]l, we generate rank fc = 10 matrices X of dimensions n — p — 500, and of 
the form X — UY,V* , where the rows oiU £ iV/fexn and of F e Af^xn are drawn independently 
from the multivariate standard normal distribution, and the elements of the diagonal matrix 
E are chosen independently from a standard normal distribution (note that U, S, and V do 
not comprise a singular value decomposition, although this process does generate matrices of 
rank k = 10). We plot out of 50 trials (a) the average relative error ||X — Xlji^/HXHi? and 
(b) the average running time in reconstructing such matrices from m = nnp randomly chosen 
observed entries (matrix completion). To produce Figure [3j2, we repeat the same procedure, 
except with the rank of the underlying matrices now fixed to be fc = 30. 
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Fig. 3.2. Comparison of IRLS-M, FPCA, and OptSpace over a class of randomly-generated rank-30 
matrices. 



2. Figures[3]3 and[3j4 are analogs of FiguresjSjl and[3]2 respectively, generated by matrices 
X of rank fc = 10 and 30 and of dimension n ^ p ~ 500 using the same model. But now the 
observed entries Xij = Xi^ + .lAfi.j are corrupted by additive noise; the A/ij are independent 
and identically distributed standard normal random variables. 
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Fig. 3.3. Comparison of IRLS-M, FPCA, and OptSpace over a class of randomly-generated rank-lO 
matrices subject to measurement noise. 



3. In Figures[3]5 and|3]6 we use IRLS-M, FPCA, and OptSpace to reconstruct grayscale im- 
ages from partial pixel measurements. We are not suggesting that matrix completion algorithms 
should be used for image inpainting, but that we use images as a more realistic (nonrandom) 
set of matrices which are well-approximated by low-rank matrices. In general, grayscale images 
correspond to matrices with sharply-decaying singular values. In all experiments, the observed 
pixel measurements are chosen independently from the uniform distribution. The displayed 
images were constructed by running each algorithm using the partial pixel measurements, and 
then thresholding all negative pixel values to be zero. 

Our results indicate that IRLS-M outperforms OptSpace for reconstructing low-rank and 
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Fig. 3.4. Comparison of IRLS-M, FPCA, and OptSpace over a class of randomly-generated rank-30 
matrices subject to measurement noise. 
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(d) IRLS-M reconstruction (c) FPCA reconstruction (f) OptSpace reconstruction 



Fig. 3.5. Figure \3.5(a)\ is t he original 256 X 256 Camera man image. Figure \3. 5(h )\ is its best rank- 
16 approximat ion. Figure \3.5(c)\ displays the 50% randomly- distributed partial pixel measurements. Figures 



\3.5(d) \3.5(e)\ and \3.5'(f) are the result of applying IRLS-M, FPCA, and OptSpace, respectively, using these 
partial measurements and with input rank r = 16, followed by thresholding all negative pixel values to zero 
The relative error (with respect to the Frobenius norm) of the best rank-Id approximation, IRLS-M, FPCA 
and OptSpace, respectively are .1302, .1270, .1646, and .2157. 



approximately low-rank matrices whose singular values exhibit fast decay. Our results indicate 
that IRLS-M is competitive with FPCA in accuracy and superior to FPCA in accuracy once the 
number of measurements is sufficiently high. Nevertheless, IRLS-M can be slower in speed than 
FPCA. Figures[3j5 and[3j6 suggest that IRLS-M has the potential to reconstruct approximately 
low-rank matrices whose singular values exhibit sharp decay more accurately than FPCA. 

Remark 3.1. Much of the literature on matrix completion emphasizes the difference 
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(a) Original image (b) Best rank-20 approxima- (c) 30% observed pixels 




(d) IRLS-M reconstruction (e) FPCA reconstruction (f) OptSpace reconstruction 



Fig. 3.6. Figure \3.6(a)\ is the original 375 X 500 Fruit Bowl image. Figure \3.6(b)\ is its best rank- 
20 approximation. Figure \3.6(c)\ displays the 30% randomly- distributed partial pixel measurements. Figures 
\3.6(d)\ \3.6(e)\ and \3.6(f)\ are the result of applying IRLS-M, FPCA, and OptSpace, respectively, using these 
partial measurements with input rank r = 20, followed by thresholding all negative pixel values to zero. The 
relative error (with respect to the Frobenius norm) of the best rank-20 approximation, IRLS-M, FPCA, and 
OptSpace, respectively are .0955, .1066, .1131, and .2073. 



between "hard" matrix completion problems and "easy" matrix completion problems, as de- 
termined by the parameters n,p,k, and m; generally speaking, "hard" problems correspond 
to the situation where the number of measurements m is close to the theoretical lower bound 
TO = Cfcmax{n,p}log^(max{n,p}). For more details, we refer the reader to ^20 . We note that 
the range of parameters n = p, to, and k considered in Figure [Sjl and Figure [3j2 corresponds 
to a transition from "hard" to "easy" problems. 

Remark 3.2. In all above numerical experiments, we ran the IRLS-M algorithm with 
parameter 7 = 1 as this greedy choice gives the best speed of convergence in practice, even 



though we could only prove convergence of the algorithm for 7 = 1/n as in Proposition 2.1 We 
stopped the IRLS-M iterations when either 200 iterations have been reached or \ei — ee-i\/ee < 
10^^ for more than 50 consecutive iterations £. These are the default parameters on the Matlab 
code we have made available onlinej^ (One part of the code is implemented in C because Matlab 



prohibited fast implementation of the sequence of problems in(4.5l below.) 

We ran the OptSpace algorithm using the publicly-available Matlab code by the author^ 
and using the default parameters provided therein. Finally, we ran the FPCA algorithm 
using publicly-available Matlab code by the author^ We found that the running time of 
this algorithm improved considerably when we changed the settings to the "easy" problem 
settings, even when the default setting was "hard" , and we report the performance of FPCA 
with this modification. 

Remark 3.3. All three algorithms are "rank- aware" , meaning that the underlying rank r 



http : //www, elms .nyu. edu/~r«ard/IRLSM. zip 



http : / /www. Stanford, edu/^raghuram/optspace/code .html 
^ Kttp : //www . Columbia . edu/~sm2756/FPCA . htm 

10 



is provided as input. In practice most matrices of interest are only approximately low-rank, and 
the best choice of r is not known a priori. In the image reconstruction examples in Figures [3j5 
and[3j6, we picked a rank k and number of measurements m for each image from a collection 
of several different attempts based on visual inspection. More generally, a systematic way to 
select the rank is using cross-validation^ see |36j for details. 

4. Computational Considerations. 

4.1. Updating W. In order to perform the weight update, 

one does not need to compute all n singular vectors of , but rather only the first r, where 
r is the largest integer for which > £t/^- This is because decomposes as the sum of a 
rank r matrix and a perturbation of the identity. In practice, it is sufhcicnt to take 7 = 1 (this 
was done in the numerical experiments of the last section); in this case, r is on the order of K, 
and decomposes into the sum of a low-rank matrix and a perturbation of the identity. 

4.2. Updating X. Consider now the matrix update, 

X = arg min J{X,W). 

In the matrix completion setting, the operator 5^ is separable^ i.e., acts columnwise, 

y{x) = (j5^iXi, . . ..^pXp) = (^1, . . . ,^p) = ^ e C", 

where Xi, i = 1, . . . ,p, are the columns of X, and J?i, i — 1, . . . ,p are suitable matrices acting 
on the vectors Xi. In this case \\W^/^X\\l = YlLi so that 

X = arg min J(X,W) ^ X, = aig min \\W^^'^XA\%, i^l,...,p. 
Note that 

X,^W'^.9':{^,W-^^*Y^Ji,, i = l,...,p, (4.1) 
where the actions here are simply matrix multiplications. To speed up the computation of 



{c9^iW ^-S^i ) one may easily parallelize the independent column updates (4.1). 



4.2.1. The Woodbury matrix identity. The updates in (4.1) can be further expe- 
dited by exploiting the Woodbury matrix identity, first introduced in [38 . It states that for 
dimensions r < n, if A e M„xn, U G M„xr, C € Mrxr, and V G Mnxr, then 

{A + UCV*)-^ = A-^ - A~^U{C-^ + V*A-^U)-W*A-\ (4.2) 

assuming all of the inverses exist. 

We may write the inverse weight matrix as the sum of a diagonal rank-r matrix and 
a perturbation of the identity: 

W^^ = eI + UT.[r]U* 

where Sj^j = (S — £l)[r\ = diag(max{crj — e,0}). If 7 = 1 is set in the algorithm, then 
W^^ decomposes into the sum of a fC-rank matrix and a perturbation of the identity during 
iterations t where et < £e-i- The matrix Jy^iW~^y* may then be expanded as 
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(4.3) 



Recall that ,5^i € Mmixn, and that ™i — ™- Setting Qi — ,S^i,5^* and Ri = yiU, and 

applying the Woodbury matrix identity, 

iS^^WW*)-' = e-'Qr' [l - R,{e^^,.l + R*Qr'R,y' R:Qr'] . (4.4) 

As is a diagonal matrix, its inverse is trivially computed. In the setting of matrix com- 
pletion, the operator is simply a subset of rows from the n x n identity matrix, and 
■S^iS^^ = / G Mm^xnii, and Ui ~ Ri ~ ,5^iU £ A^m^xr is a subset of m.i rows from the 



matrix U. Therefore, utilizing the formula (4.4), this reduces to 

(^,M/-i^;)-i = [/ - t/.(£S[;i + U*U,y'u*] , (4.5) 

and one essentially needs to solve a linear system corresponding to the matrix + U*Ui € 
Mj-xr as opposed to a system involving the inversion of a matrix in Mjmxmi- When 7 = 1 and 
r = K + 1, such as at iterations £ where Si < e^-i, the average number of measurement entries 
in any particular column is at least m — 0{klognp) 3> fc, so this amounts to a great reduction 
in computational cost. 

5. Variational Interpretation. 

5.1. A functional of matrices and its derivatives. In this section we show that the 
low-rank matrix recovery can be reformulated as an alternating minimization of a functional 
of matrices. Assume X e M„xp, ^ W = W* G Mnxm and consider the functional, 

J{X, W) \ {\\W''^X\\l + \\W-^'^l) . (5.1) 

In the following subsections we shall prove that the IRLS-M algorithm may be reformulated 
as follows. 



IRLS-M algorithm for low-rank matrix recovery: Initialize by taking :~ I E 

Mnxn- Set £q '■= 1, K > k, and 7 > 0. Then recursively define, for £ = 1, 2, ... , 

A:^:=arg min J(X,W'^), 

:= min{Q_i,7CTK+i(A:'^)} . (5.2) 
The up-date of the weight matrix W*^ follows a variational principle as well, i.e., 

W^-.^arg min J{X'^,W) (5.3) 

The algorithm stops if ee = 0; in this case, define X^ :— X'^ for j > £. In general, the 
algorithm generates an infinite sequence {X^)g^fq of matrices. 



5.1.1. Optimization of J' v^rith respect to X. Let us address the matrix optimization 
problems and explicitly compute their solutions. For consistency of notation, we need to 
introduce the following left-multiplier operator : Mnxp ~^ M^xp defined by yV~^{X) := 
W~^X. With this notation we can write explicitly the solution of the minimization of with 
respect to X. 

Lemma 5.1. Assume that W £ Mnxn o,nd W — W* >- 0. Then the minimizer 

X = arg min J(X, W) 
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is given by 



Proof. For W = W* >- 0, we have 



X = arg min 



\\w'^^x\\l 



if and only if there exists a A € C™ such that 



(A) = WX, or {WX, H) = 0, for all H e ker y. 



(5.4) 



This equivalence can be shown directly: assume X optimal and H e ker^, then 

= Ty{WXX*) < Tt{W{X + H){X + H)*) 

= Tv{WXX*) + Ti{WHH*) + Tv[W{HX* + XH*)]. 

Actually X is optimal if and only if Tr[W{HX* + XH*)] = for all H e ker J^. By the 
properties of the trace we have 

Tr{WHX*) = Tr[{WHX*)*] = Tv{XH*W) = Ti{WXH*) = Tv{H*WX). 

Hence Ti[W{HX* + XH*)] = if and only if ^e{{WX,H)) = 0. This observation al- 
lows us to compute X explicitly. If wc assume for a moment that actually {WX,H) = 
for all H € ker^, then necessarily WX = 5^*{X), and we obtain X = W'^ o y*{X). 
Hence ^ = y{X) = ^ o o y*(\) and \ = o W'^ o S^*\-^[Jl). Eventually 
X = o y* o\y oY^-^ o y*]-^) {.£) satisfies by construction (WX,H) = 0, hence 

die{{WX,H)) = 0, and the optimality condition holds. □ 

5.1.2. Optimization of J with respect to W. In this section we address the solution 
of the following optimization problem 



Let us collect a few preliminary lemmas in this direction. 

Lemma 5.2. Under the constraint W = W* >- 0, the Frechet derivative of J with respect 
to its second variable is 



W = arg 



mm 



J{X, W). 



dwJ{X, W) = XX* - W 



-2 



(5.5) 



Proof. Let us consider the Gateaux differentiation with respect toV = V*, 




= Tv{VXX*) = {XX\V). 



Since the self-adjoint and positive-definite matrices form an open set, we have 



dw\\W^^^X\\l=XX*. 



(5.6) 
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It remains to address the Frechet derivative of \\W ^^^Wp = Tr(W^ ^). We have 
d 



dV 



^ lim /i-i {Tr[(VK + hVy^] - Tv{W-^)] 



= Tr 



[W + hV)-^ -W-^ 



hm 



Note that {W + /iV^) 



(W+hV)-^-W-^ 
h 



hm 



= -VW-^, so 



-w-^vw- 



By using the cycUcity of the trace we obtain ^^^IIf — ^1^)1 £^nd therefore 
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(5.7) 



This latter statement can also be obtained by applying the more general result |24l Proposition 
6.2]. Putting together and ([5J| yields Js^. □ 

Using a similar argument as in Lemma |5.2[ we arrive at the next result. 

Proposition 5.3. Consider X — UY;V* , where E = diag(cri, ...,(t„). Let e > 0. Then 



arg min J{X, W)^:W ^ UY^-^U* 



(5.8) 



We recall that = diag(max{CTj, e}). 

Proof. First observe that J^{X, W) = cx) if is not invertible so that 

Ty = arg min J(X,W). 

That is, the constraint of positive-definiteness can be relaxed to positive semi-definiteness. 
Now, the set Cq — {W € M„xn ^ W — W* ^ e~^I} is a convex set, and may be 
reformulated as 



Co = {We Af„x„ ■.O^W = W*, \\W\\op < e-^} 



= {W e M„y,„ -.0 diW ^W*, sup \\W 



1/2„||2 _ 



sup \\Wu\\e^ < £-1}. 
l!«llf?=i 



Let Ci £ M" denote the i*^ canonical basis vector, and note that ||J7ei||f" = 1. Then it is clear 
that Co C Ci , where Ci is the convex set 

Ci^{W e M„^,r-O^W ^W*, \\W^/^Ue,\\\<e~\ z = l,...,n}. 

The Lagrangian corresponding to the optimization of over the convex set Ci is given by 



C{X,W,\) 



i=l 



by weak duality [31 p. 226] the minimum of J{X, •) over Ci can be bounded from below by the 
Lagrangian dual function, 

sup 5(A) < min J{X, W), g{\) min C{X, W, A). 

(Ai,...,A„)>o M'eCi weci 
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Following an argument similar to Lemma 5.2 Wx — argmin^vgCi C{X,W,X) agrees with the 
solution to the Karush-Kuhn- Tucker conditions, 



XX* + XiUEuU* = W-"^, (5.9) 
1=1 

or 

if e Ci, or equivalently if A.^ > maxje^ — tT^,0}. A straightforward calculation shows that 
A = argmax;^.>„iax{e2_^2 o}5(A) satisfies 



A,; 



0, else, 



while 5(A) = \ lla,<e (^4^) + E^.>£ <^^■ But at W-x = J/diag (erf + \,)-^/^U* , we have the 
equality g{\) = J'{Wx, X). By duality, must realize the minimal value of J^{X, •) over the 
set Ci. As W\ is also contained in Cq and as Cq C Ci, it follows that T4^^ also minimizes J^{X, •) 
over the original convex set Cq. We set W — W^- □ 

6. Analysis of the Algorithm. 

6.1. Basic properties of the iterates. We start with simple properties of the iterates 
that hold without any requirements. In order to show the actual convergence of the IRLS-M 
algorithm along with the relationship to nuclear norm minimization, we will assume additional 
properties of the measurement map in the next subsection. 

Proposition 6.1. Assume that {X^,W^,ei)i^ji is the output of the main algorithm. Then 
the following properties hold. 

(i) J(X'^+^,W'^+^) < J{X'^, W^), for all i G N; 
(ii) J{X'^,W^) > IIX^IU, for allie N; 
(Hi) There exists a constant .s^f > such that 

2^ [J{X', W') - J{X'+\W'+^)\ > \\X'+^ - X'Wl. (6.1) 

This in particular implies 

lim \\X^+^ - X^Wl ^ 0. (6.2) 

Proof, (i) By the optimality properties of X^^^ and W^~^^, along with the admissibility of 
in the optimization of W^^^ (as e^+i < ee), we obtain the following chain of inequalities 

JiX^+\W^+') < J{X^+\W^) < J{X\W^). 

(ii) A direct computation shows 

(^f)^ + £f 



J{X\W') 



where r is the largest integer for which > e^. Since (erf )^ + > 2^^ • erf, we have 

J{X',W') > \\x'\\,. 
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(iii) We have the following estimate, 

= {W'X\X'}-{W'X'+\X'+') 

= {W'{X' + X'+^),X' - X'+') + 2i^m{{W'x', X'+')). 



By optimality of X'''+^ and in virtue of we can add = -2{W'^X'^+^,X'^ - X'^+^) and 

= 2iQm{{W'^{X'^+^ ~ X^), X'^+i)) (note also that - X^+i e ker^), and we obtain 

2 [J{X', W') - J{X'+\w'+^)\ > {W'{X' - X'+^),X' - X'+') 

= \\iW'y/^X' - X'+')\\l. 

On the one hand, 

\\x' - x'+'\\l < uw'y^'ix' - x'+')\\%\\{w')-'/'\\l, 

and 



On the other hand, 



o[<\\X%<J{X^,W''). 



Hence, there exists a constant ^ > such that ^^^Hf < ^ for all ^ G N. It follows 

from (6.1 1 that 



N 



J2 - < 2i/ W°) - J{X^+\W^+^)] < 2j^JiX°, W°). 



This implies that the sum on the left converges as TV — ^ oo, and (6.2) follows. □ 

6.2. Null space property. We need to enforce additional conditions in order to obtain 
a simultaneous promotion of the approximation to both minimal rank and minimal nuclear 
norm solution. For this purpose we recall the rank null space property, which is the analog to 
the null space property for the vector case in compressed sensing [51 [1^1 HH [32] ■ 

Definition 6.2. A map y : M„xp -> C™ satisfies the rank nuU space property (RNSP) 
of order k if for all H G kcr .y \ {0} and all decompositions H = Hi + H2 with k-rank Hi it 
holds 



\Hi\\* < \\H 



2\\*- 



The rank null space property is closely linked to the nuclear norm minimization problem 
min subject to y{X)=.^^. (6.3) 

X G Mn X p 

The following theorem has been shown in |351 Theorem 3]. 

Theorem 6.3. Let y : Mnxp C™ be a linear map. Then, every k-rank X G Mnxp is 



the unique solution of the problem (6.3) for the datum y( — y(X) if and only if y satisfies 
the RNSP of order k. 
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We also need a slightly stronger condition. 

Definition 6.4. A map : Mnxp C™ satisfies the strong rank null space property 
(SRNSP) of order k with constant rj G (0, 1) if for all X G herJ^ \ {0} and all decompositions 
X = Xi + X2 with k-rank Xi there exists another decomposition X = Hi + H2 such that Hi 
is of rank at most 2k, (iJi, ^2) = 0, XiH^ = 0, XIH2 = 0, and 

\\Hi\U<v\\H2\\*- 



Remark 6.5. The above definition is equivalent to \2(A Definition H.4-]: Let GL{U,V) 
be the subspace of n x p matrices of rank at most k, whose row space belongs to the span of 
V G Mnxk o,nd luhose column space belongs to the span of U ^ Mnxk- Let Sk — {GL{U,V) : 
U £ M„xfc, V G Mpxk, U*U = V*V = I}. For T € GL{U, V), we let Vt be the projection onto 
T, and denotes the orthogonal complement of T in Mnxp- Then a map : M„xp ^ C™ 
satisfies the strong rank null space property (SRNSP) of order k with constant rj G (0, 1) if 

IIPtWII* <?7||7't-II* 

for all T e Sk and H e ker y \ {0}. 

Further, we introduce the best k-rank approximation error in the nuclear norm, 

Pfe(X)* mm \\X-Z\\^. 

2eM„xp:i'ank(Z)<fc 

It is well-known that the minimum is attained at the /c-spectral truncation Z = X^^.] ■ The 
SRNSP implies the following crucial inequality. 

Lemma 6.6 (Inverse triangle inequality). Assume that : M„xp ~^ satisfies the 
SRNSP of order k and constant rj £ (0, 1). Let X, Z be matrices such that 

y{x) = y{z). 

Then we have the following inverse triangle inequality 

\\X - Z|U < [\\Z\U ||X|U + 2pfe(X)0 . (6.4) 

This result was independently obtained in |261 Lemma II. 5]. Inverse triangle inequalities are 
well-known in the context of the classical null space property for vectors Uni, see also the 
survey paper |,.31j . 

Proof. Let := be the best fc-rank approximation of X in the nuclear norm, and set 
Xc^X~Xq. Let H ^ X-Z ^ Xo + Xc-Z eker.y. Set Xi = Z-X^, so that = Xq-Xi. 
By the definition of the strong null space property, there exists a decomposition H = Hq + He 
with rank(iJo) < 2fc, {H p, He) = 0, XqH* = 0, X^H^ = and such that ||i?o||* < ^||-ffc|U- 
Hence, by Lemma |6. 10| recalled below, we have 

\\Xo-He\U = \\He\U + \\Xo\U 
and, as Hq + Xi = Xq — He, we have equivalently 

||i^c||* + !|^0||* = ||ffo+^l||* < ||ffo||* + ||^l||*. 

The inequality |li?o||* ^ vWHcW* yields 

||ifcll* + ||^olU<ll^i||*+'7l|i^cll*, 
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or, equivalently, 



1-7? 



1^1 I 



^oll*) 



Now, using Xi = Z — Xc, we obtain 



\X ~Z\\, = \\Ho + i/c||* < (1 + v)\\Hc\\* < 



1 + ,, 

I — rj 



l^i||*-||^o||*) 



< ^{\\z\u - \\Xo\U + \\Xc\U) < ^(ll^ll* -\\x- x,\u + \\x,\U) 



< 



1 — rj 

l + TJ 

1^ 



i\\z\U-\\x\u + 2\\x,\U). 



The proof is completed by noting that 



\x - x„\u = Pk{x),. □ 



Note that if X is fc-rank then pll{X).^, = and (6.4 1 reduces to 

1+7? 



\X-ZL < 



1-7? 



(Il^l 



I^IU)- 



(6.5) 



Now we easily conclude that the SRNSP implies stable low-rank recovery via nuclear norm 
minimization. 

Corollary 6.7. Suppose =5^ : M„xp ~^ satisfies the strong null space property of 
order k with some constant rj G (0, 1). Let X € Mnxp o,nd ^ = 5^{X). Then a solution X of 
(6.3) satisfies 

\\X-X\\^<2\±^pu{X)^. 
1 - 77 

/7i particular, every k-rank X G Mnxp is the unique solution of the nuclear norm minimization 
problem (6.3). Consequently, the strong rank null space property of order k with some constant 
7? < 1 implies the rank null space property of order k. 

Proof Clearly \\X\\^ < ||X||, and ^ yields < 2(1 + 7?)/(l - 77)pfe(X),. Since 

Pk{X)^, — for all fc-rank X, the latter estimate implies exact recovery of such matrices. 
Theorem 6.3 implies also that satisfies the rank null space property of order k. □ 

The SRNSP is somewhat difficult to analyze directly. As in the vector case [12 , it is 
implied by the (rank) restricted isometry property. Definition [lT] 

Proposition 6.8. Assume that : Mnxp C™ has restricted isometry constant 



54fe < \/2 - 1 w 0.41 . 
Then satisfies the SRNSP of order k with constant 



(6.6) 



e(0,i) 



Our proof uses [SJ Lemma 3.3] and |3U Lemma 2.3], which we recall here for the reader's 
convenience. 

Lemma 6.9 (Lemma 3.3 in [S]). Let y : Mnxp C™ with restricted isometry constant 
< Then for two matrices X,Y G Mnxp with rank(X) + rank(y) < k and {X,Y) = it 
holds 



\{y{x),y{Y))\<s,\\xy\\Yy. 
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Lemma 6.10 (Lemma 2.3 in |34j ) . Let X and Z be matrices of the same dimensions. If 
XZ* =0 andX*Z = then 



\\x + z\u^\\x\\. + \\z\u. 

Proof, [of Proposition 6.8 For convenience we give a simple proof, which results in the 



stated non-optimal constant in (6.6 1, along the lines of the proof of Theorem 2.6 in [35]. Let 
X £ ker^ \ {0} and Xi be of rank at most k, so that, by setting X2 = X — Xi, we have 
X = X1+X2. Let 



.0 0. 

be the (full) singular value decomposition of Xi with S being a fc x fc diagonal matrix. Set 
H = U*XV and partition 



H 

where Hu £ M^xk- Now, set 



Hii H12 
H21 H22 



""-"i^^i "^y- 

Then rank(_ffo) < 2fc, XiH* = 0, XlHc = 0, and {Ho,Hc) = as desired. It remains to 
prove that ||-ffo||* < ^/ll-ffc||*- Let H22 have singular value decomposition H22 = U diag{a)V* , 
in particular, the entries of a are in decreasing order. Now we choose an integer £ > and 
decompose the vector a into vectors a^-^\ j = 1, 2, . . . , of support length £ defined by 

^U) if^(j-l) <*<^j, 

' 1^ 0, otherwise . 
Clearly, X),- ^^^^ = a. We set, for j = 1, 2, . . . , 



^^ -^(0 ^diag('a(^")y* ' ^* 



Then rank(i?j) < £, = Y.j>i Hj and {Hq, Hj) = for aU j = 1, 2, . . . . Also, observe that 
since H e ker^ we have J^(lro + -^i) = T,j>2'^i-H])- Now we first use that ||i7o||* < 
•\/2fc||i?o||F < V2k\\HQ + Hi\\ by orthogonality of Hq and Hi, and estimate the latter. To this 
end, we use orthogonality of Hq and Hi, the RIP and Lemma [6. 9[ 

11^0 + HiWl < W-YiHo + Hi)\\% = {.y{HQ + Hi),Y, 

1 - 02k+e 1 - (>2k+e 

= — E(^(^o + Hi),y{H,)) < -%^iii/„ + hiWfY. \\h,\\f. 

1 - 62k+e 1 - d2k+i 

Hence, \\Hq + Hi\\p < ^j>2 II^jII-F- Since the sequence a is nonincreasing, we have 

by definition of a^^^ that, for j > 2 and for all i, crp"''"'^^ < | He'-"'-' |Ui — j\\Hj\\*, and therefore, 

1/2 




1 

71' 



Combining our estimates we obtain 



-I^^WU. (6.7) 



1 - S2k+e V ^ 



In the last step we applied Lemma 6.10 Now we choose £ — k and obtain ||Ho||* < ??ll-ffc|l* 
with 

' 1-S3k 

Since < 64k, we have 77 < 1 if 64k < \/2 — 1. □ 

We note that different choices of £ in ( |6.7[ ) (say £ — 2k, or £ — \k/2]) lead to slightly 
different conditions. Also, we do not claim optimality of the constant — 1. We expect that 
with a more complicated proof as in [S] or [T31 El US] , one can still improve this value. Our 
goal here was to rather provide a simple proof. After submission of an initial version of our 
paper, we became aware of [29] . where an elegant relationship between vector and matrix cases 
is shown. In particular, properties and recovery guarantees valid for sparse vector recovery by 
means of £i-minimization are shown to imply corresponding properties and recovery guarantees 
for low-rank matrices by means of nuclear-norm minimization. It seems that together with [25] 
this implies slightly better guarantees than provided in Proposition |6.8[ 

6.3. Analysis of the algorithm when satisfies the null space property. Before 
presenting the main result of this paper, we introduce another functional, dependent on a 
parameter e > 0, 

n 

MX) Je(S(X)) = ^/(a,;(X)), (6.8) 

1=1 

where 

' ^"^ = 1 u<s. 

Theorem 6.11. Consider the IRLS-M algorithm for low-rank matrix recovery, with pa- 
rameters 7 = 1/n and K CzN. Let 5^ : Mnxp — ^ be a surjective map. Then, for each set of 
measurements ^ G C™, the sequence of matrices (Ar^)^gN produced by the IRLS-M algorithm 
has the following properties: 



(i) Assume that 5^ satisfies the strong rank null space property of order K (Definition 6.4 )■ 
//lim£_j.oo £e = 0, then the sequence {X^)i^j^ converges to a K-rank matrix X agreeing 
with the measurements; in this case, X is also the unique nuclear norm minimizer. 

(a) // lim£_>.oo ££ = e > then every subsequence of X^ has a convergent subsequence. 
Each accumulation point X o/(A"^)^>i coincides with a minimizer X of the functional 
subject to ■y{X) = ^ . In particular, if this minimizer is unique then the full 
sequence (A'^)^>i converges to it. 

If, i n addition, ,5^ satisfies the strong rank null space property of order K (Definition 



6.4) and constant r/ < 1 — , then each accumulation point X = X of {X 
satisfies, for any matrix X such that ,S^(X) — ^ , and for k < K — 

\\X - X\\, < Kpk{X),, 
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where 



A 



4(1 + ^)^ 2(1 + 7?) 

(l-ry)2((X-fc)(l-r/)-2?7) 1 - ' 



( in) In particular, ij satisfies the strong rank null space property of order K with constant 
rj < \ — -j^zjij Oixii if there exists a k-rank matrix X satisfying S^{X) = ^ , then 
necessarily e = 0. 

Proof. Note that since < e^+i < ei, the sequence (e^)^^^ always converges, 
(i) If £<i = for some £, then we set X = X^. If > for all £, then there exists a 
subsequence, such that 



ee.+i < El,. 



and ££^+1 = ^(TK+i{X'^'^ 

Hence, we can extract a further subsequence, denoted again by {X 



Because of Proposition 6.1 (i) and (ii), {X^^^^)j is bounded. 

)j for simplicity, which 



7.1 



converges to some X = limj_^oo X^ . Since limj_>oo ^ij+i = 0, an application of Theorem 
immediately yields aK+i{X) = 0, and X is a if-rank solution of ,5 ^{X) = ^ . 

The strong rank null space property together with Corollary 6.7 implies that X is the 
unique nuclear norm minimizer X. Next we show that the whole sequence {X^)t converges to 
X. We have 



J{X\W^ 



i=l i=r{i) + l 



£\2 



2ee 



where we recall that r{£) is the smallest integer for which af < egifi > r{£). Since J'{X^, W^, ei) 
is a monotonically decreasing sequence and J{X^\ W^' ,ei.) — )■ \\X\\^ for j — oo, we also have 
J{X^, W"-, eg) — ll^ll* for £ — )• oo. We have the estimates from above and below 



J{X',W\si) 



( E\2 _|_ 2 

E {^^^^^l)<\\X%<J{X^'.W\e,). 



i=r(l) + l 

As £ — cx), erf < £^ — ^ for i > r{£) + 1, and we compute 



and we obtain 



hm y " f 



2e, 



a^) < hm V ( ^^--^) < lim nei = 0, 



i=r(£) + l 



lim IIAT^IL = \\X\ 



Since X is a if-rank matrix, we can apply the inverse triangle inequality (6.51 to obtain 

1 +7? 



[X' -x\ 



< 



l-rj 



(II^'IU-II^I 



(6.9) 



oo in (6.9), we obtain lim, 



X'^X. 



where < 1 by assumption. Taking the limit for 

(ii) In the case that \im£^oa = £ > we use the functional jZe introduced in (6.8 1. 
Observe that ; M" — > M+ is diffcrentiable, convex, and absolutely symmetric, see Section 
|7.2[ By Proposition |7.4[ we therefore have, for X with singular value decomposition X = 
Udiag{a{X))V*, 

VJ,(X) = ;7diag(j;(a,(X)))l/*, 
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where the derivative of is given by 
We have 



sgn(w), |u| > s, 
w/e, \u\ < e. 



X = X(e)earg min J,{X) 

if and only if (Vj7e(X), H) = Q for all H E kci S^. It is now straightforward to verify that the 
latter is equivalent to 

{WX, H) = 0, for ah H e ker^, (6.10) 

where W = [{{XX*y^^)e]-\ Here, we recall that is the e-regularization of Z, see ( |2.3p . 

Since X^ is a bounded sequence, it has accumulation points. Let us first show that any 
accumulation point of X^ is a minimizer of J7e subject to the constraint ^{X) ~ .4^. Note 
that jTg is not strictly convex, so such a minimizer is not necessarily unique. Let {X^^ be 
any convergent subsequence and X its limit. Note that W^^ = [((X^j (X^^ ]~^. Hence, 

W^^ depends continuously on X^^ so that also W^^ converges to a limit W ~ linij^oo = 

[iixx*)'/% 



by m 



6.1 



By invoking Proposition 
{WX,H) = lim {W^^X^^+\H 



iii) we have also X^^^^ — > X for j — > oo, hence 



0, for all H G ker y. 



This is exactly the optimality condition (6.10) and therefore X — X is a, minimizer of j7e(X). 

We now prove the error estimate. To begin, let X^^ be a converging subsequence of X^ 
with limit X, which, for simplicity we denote again by X^. As just outlined X — X is a, 
minimizer of J7e subject to ^{X) = ^ . Note that for any matrix X satisfying the constraint 
S^{X^ = and for any minimizer X of subject to the constraint ^{X) = 



\\X\U < Je{X) < Je{X) < \\X\\,+ne, 
where we have used the fact that X is a minimizer of J7e, so that 

ll^ll* - ll^ll* < ne. 
By the inverse triangle inequality. Lemma 6.6 and ( |6.11[ ) 



\\X -X 
Thus we reach 



< 



\X\U+2p,{X),) < 



V 



{ne + 2pk{X), 



(6.11) 



(6.12) 



ne 



lim ne^ < lim afc+i{X 



It follows from Proposition |7.2| that 

[K + l- k)ne <{K + 1- k)aK+i{X) < \\X - X\\^ - 



Pk{X)^ 



< 



1 



1] 



1 — rj 



{ne + 2pkiX),)+pkiX), 



Under the assumption that K — k > , this inequality yields 



4(1 + rj) 

- {l-rj){{K-k){l-rj)-2rj)P'^^^* 



Plugging this back into (6.121, we arrive at the desired result. □ 
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6.4. Discussion of related work. While we were finishing this paper, Maryam Fazel 
informed us of her joint work with Karthik Mohan, where a similar iteratively least squares 
minimization algorithm is studied |26j . We shortly outline the differences between our contri- 
bution and [55] . In a direct and faithful generalization of the algorithm analyzed in [TU] 



for sparse vector recovery is provided. In particular, instead of (2.121 

W' = {7^(diag(max{a^^£a))''(f/0*, 
the update rule for the weights is given there by 



On the one hand, the drawback of our up-date rule (2.12) is that is it not anymore equivalent 
to an unconstrained minimization of the energy 

J{X', W) = J{X\ W) + ^WW^'Yf. 

with respect to the second variable W >- but to the constrained minimization of J{X^ , ■) 



as indicated in (5.3 1. Such equivalence had to be proven in Lemma 5.2 and Proposition |5.3[ 
representing an additional technical difficulty in order to develop a proof of convergence, and 
introducing an element of novelty with respect to |10) . On the other hand, our choice is 
motivated by a significant improvement of the complexity of the algorithm when the operator 
is separable^ i.e., acts columnwise, 

y{x) = {^iXi, ypXp) = (^1, . . . , = ^ € c", 

where Xi, i = 1, . . . ,p, are the columns of X, and J?i, i = 1, . . . ,p are suitable matrices acting 
on the vectors Xi. We repeat that this case includes relevant situations such as the matrix 
completion problem. As clarified in Section pi. 2. 1[ the Woodbury formula, which can applied to 



the matrices S^i{W^)~^y* , thanks to the particular structure of the matrices as in (2.121, 
allows for their inversion with a much more economical cost than it would be possible by using 
the up-date rule proposed in Moreover, while the analysis in [55] also partially addresses 
reweighted least squares algorithms for Schatten norms, for g < 1, in the case of the nuclear 
norm (g = 1) the authors prove in |26l Theorem II. 7] only an analogous version of our Theorem 



6.11 (i), and do not develop a counterpart to (ii) and (iii). 
7. Appendix. 

7.1. Basic properties of singular values. In [S^ Weyl proved the following stability 
estimate on the singular values. 

Theorem 7.1. Let X,Y e Mnxp be fixed. Then 

\a,{X) - a,iY)\ <\\X-Y\\f, ^ = 1,2,.... 



Proposition 7.2. If X e Mnxp and Y e Mnxp, then for any j, we have 

\\\X-Xy]\U~\\Y~Y[,]m<\\X-Y\U, 

and for any J > j, we have (J — j)aj{X) < \\X — Y\\^, + \\Y — ^jj]]!*- 
Proof. The first statement is a simple chain of inequalities: 

\\x-Xy]\u<\\x-Y[,]\u < ||x-y||, + ||r-Y[,]||,. 

The result follows as we may reverse the roles of X and Y. For the second inequality, it suffices 
to note that {J - j)aj{X) < \\X - Xy^W.,. □ 
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7.2. Differentiation and singular values. Let V„ denote the set of all n x n permu- 
tation matrices. A function / : M" — > M is called absolutely symmetric if 

f{PDx) = f{x) for all x G M", P G Vn, 

and for all diagonal matrices D having only the values +1,-1 on the diagonal. A function 
F : Mnxp — > is called unitarily invariant if F{U*XV) for all X S Mnxp, and for all 
unitary matrices U E Mnxn, V G Mpxp- The following fact is well-known, see for instance [24l 
Proposition 5.1]. 

Proposition 7.3. A function F : Mnxp ~^ n < p, is unitarily invariant if and only 
if F{X) = f{a{X)) for some absolutely symmetric function f : M" M. Here, cr(X) G M" 
denotes the vector of singular values of X . 

Let / : M" M be absolutely symmetric. Then the function F{X) = f{a{X)) is convex 
if and only if / is convex. We have the following result concerning differentiation of F [211 
Proposition 6.2]. 

Proposition 7.4. Let f : M" -^M. be an absolutely symmetric and convex function. The 
function F = f o a on Mnxp is difjerentiable at X G Mnxp if and only if f is differentiable at 
(y{X). In this case 

VF{X)^Udiag{Vf{a{X))V*, 

where X =^ U diag{a{X))V* for unitary matrices U, V . 
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